Skip to content

docs: external perf notes + verification#11

Open
web3dev1337 wants to merge 40 commits intomasterfrom
feature/perf-external-notes-verification-20260305
Open

docs: external perf notes + verification#11
web3dev1337 wants to merge 40 commits intomasterfrom
feature/perf-external-notes-verification-20260305

Conversation

@web3dev1337
Copy link
Copy Markdown
Owner

@web3dev1337 web3dev1337 commented Mar 5, 2026

Summary

  • Complete HYTOPIA performance framework implementation
  • SDK module: PerformanceMonitor, @monitor decorator, NetworkMetrics, CpuProfiler, bot system with 4 behaviors
  • Benchmark CLI: YAML scenario runner, headless client, baseline comparison, 5 built-in presets
  • Trace analysis: Chrome trace parser, CPU profile analyzer, spike correlator, noise filter
  • CI/CD: perf-gate + baseline-update GitHub Actions workflows
  • Research: 7 analysis docs from 80+ HyFire2 branches

What is this?

A general-purpose performance framework for ANY HYTOPIA game. Two components:

  1. SDK built-in module (~13KB): PerformanceMonitor.instance.enable() - one line to start profiling
  2. CLI tool (@hytopia/perf-tools): hytopia-bench run --preset stress - repeatable benchmarks

Test plan

  • SDK build passes (tsc + api-extractor)
  • SDK size increase ~13KB (acceptable)
  • Manual: enable PerformanceMonitor in playground, verify TICK_REPORT events
  • Manual: spawn 100 bots, verify tick stays under budget
  • Run hytopia-bench run --preset idle against sdk-example

🤖 Generated with Claude Code

web3dev1337 and others added 30 commits March 5, 2026 14:58
6-agent swarm research covering:
- HyFire2 master: 3 profiling classes, 5 monitors, 20+ scripts, 2 Python analyzers
- HyFire2 analysis/docs branches (11): 3-month retrospective, hotspot analyses, automation playbooks
- HyFire2 feature branches (18): monitoring UIs, flame charts, mobile perf, stress testers
- HyFire2 perf/fix branches (27): optimization techniques (caching, pooling, deferral)
- HYTOPIA SDK: Telemetry, PerformanceMetricsManager, DebugPanel, Stats classes, 8 gaps identified
- Headless/automation: Puppeteer scripts, Chrome trace parsers, ARM64 sim, CI/CD roadmap
- SYNTHESIS: 965-line framework spec with architecture, scenarios, metrics, 4-phase roadmap

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SDK Module (server/src/metrics/ + server/src/bots/):
- PerformanceMonitor: singleton profiler with CircularBuffer tick history,
  per-operation p50/p95/p99 percentiles, spike detection, entity profiling
- @monitor decorator + monitorBlock/monitorAsyncBlock helpers
- NetworkMetrics: bandwidth/packet/serialization tracking singleton
- CpuProfiler: V8 Inspector CPU profile + heap snapshot capture
- WorldLoop integration: per-tick phase timing (entities, physics, network)
- EntityManager integration: opt-in per-entity cost tracking
- Telemetry.ts: dual-path Sentry + PerformanceMonitor when enabled
- Bot system: BotPlayer, BotManager, 4 behaviors (Idle/RandomWalk/Chase/Interact)

Benchmark CLI (packages/perf-tools/):
- ScenarioLoader: YAML/JSON benchmark scenario parser
- BenchmarkRunner: orchestrates scenario execution with phases
- MetricCollector: aggregates server + client snapshots
- HeadlessClient: Puppeteer-based headless browser for client metrics
- BaselineComparer: regression detection with warning/fail thresholds
- ConsoleReporter + JsonReporter
- 5 built-in presets (idle, stress, large-world, many-players, combined)

Trace Analysis:
- TraceParser: Chrome DevTools trace → frame timings, long tasks, GC events
- CpuProfileAnalyzer: V8 .cpuprofile → hot functions, call tree, WASM mapping
- SpikeCorrelator: correlate tick spikes with entity spawns, GC, phase overruns
- NoiseFilter: IQR outlier removal, moving average, change point detection

CI/CD:
- perf-gate.yml: PR performance gate with baseline comparison
- perf-baseline-update.yml: baseline update on master push

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All benchmarks PASS on all thresholds.

Results:
- idle:           avg=0.05ms  p99=0.15ms  mem=40.7MB  (empty server)
- stress:         avg=0.26ms  p99=1.34ms  mem=47.0MB  (100 bots: walk/chase/interact)
- entity-density: avg=0.39ms  p99=0.79ms  mem=43.7MB  (500 dynamic entities)
- block-churn:    avg=0.75ms  p99=1.66ms  mem=53.6MB  (10 clients, continuous edits)
- large-world:    avg=0.35ms  p99=0.60ms  mem=43.5MB  (boilerplate map + 20 bots)
- blocks-500k:    avg=0.16ms  p99=1.60ms  mem=51.6MB  (500K blocks, 20 WS clients)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…neck

Results from 5 heavy stress tests:
- combined (100 bots + 10 clients): ALL PASS, avg 1.73ms tick
- many-players (50 WS + 50 bots): ALL PASS, avg 0.96ms tick
- join-storm (100 clients at once): FAIL p99 71.54ms — serialize_packets bottleneck
- blocks-1m-multi-world (4x1M blocks + 40 clients): ALL PASS, avg 0.04ms tick
- blocks-10m-dense (10M blocks + 1 client): ALL PASS, avg 0.16ms tick

Key finding: join-storm exposes a chunk serialization bottleneck when
100+ clients connect simultaneously. serialize_packets avg=16.15ms,
p95=45.91ms with a max tick spike of 1253ms. This is the only scenario
that exceeded thresholds (p99 71.54ms vs 30ms limit).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ngle core)

CPU throttled to 30% of 1 core via taskset + cpulimit, simulating a
~1.5GHz single-vCPU cloud VM (vs dev machine: 9800X3D 16-thread 5GHz).

Throttled results vs unthrottled:
- idle: 0.07ms avg (was 0.05ms) — PASS
- stress (100 bots): 0.47ms avg (was 0.26ms) — PASS
- combined (100 bots + 10 clients): 2.22ms avg (was 1.73ms) — PASS
- many-players (50 WS + 50 bots): 1.46ms/11.73ms p99 (was 0.96ms/3.58ms) — PASS
- block-churn (10 clients + edits): 1.79ms/32.30ms p99 (was 0.75ms/1.66ms) — FAIL p99
- blocks-10m-dense (10M blocks): 0.37ms avg (was 0.16ms) — PASS
- join-storm (100 clients): 14.46ms avg/376ms p99 (was 4.40ms/71ms) — FAIL avg+p99

Key findings on weak hardware:
1. join-storm is catastrophic: avg tick=14.46ms (basically at budget),
   p99=376ms, serialize_packets p95=633ms. 100 concurrent joins on a
   weak server = unplayable for everyone.
2. block-churn now fails p99 (32ms vs 30ms threshold) — on-the-edge.
3. combined/stress/many-players still pass comfortably.
4. Serialization is the #1 bottleneck across all failing tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- ProcessMonitor: reads /proc/<pid>/stat + status for CPU%, RSS, threads, FDs
- BenchmarkRunner: integrates ProcessMonitor, graceful PerfHarness fallback, log capture
- MetricCollector: process snapshot collection support
- ConsoleReporter: process metrics display with CPU% threshold warnings
- CLI: --no-perf-api and --log-file options
- Scripts: link-sdk.sh and setup-game.sh for external game SDK linkage
- Presets: hyfire2-bots.yaml and zoo-game-bots.yaml

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The server spawns with shell=true+detached=true, so the PID is the
shell process, not the actual node worker. Fixed by scanning /proc
for all PIDs sharing the same process group and summing their
CPU/RSS/threads/FDs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Server changes:
- Restore WorldMapCodec, WorldMapChunkCacheCodec, WorldMapFileLoader,
  WorldMapArtifactsGenerator from feature/map-compression branch
- Export all map codec types from SDK barrel (index.ts)
- World.loadMap() accepts string paths + compressed formats with auto-detect

Benchmark results:
- HyFire2 5v5 bots: avg tick 0.61ms, p99 1.34ms, 431MB heap, 1.2GB RSS
- Zoo Game (OS-only): avg CPU 7.3%, 746MB RSS, 24 threads

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…erfHarness benchmark

Added startModelLoopedAnimations, startModelOneshotAnimations,
setModelNodeEmissiveColor, setModelNodeEmissiveIntensity to Entity.ts
for Zoo Game SDK compatibility. Zoo Game now benchmarks with full
PerfHarness: avg tick 0.25ms, p99 0.85ms, 313MB heap.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds PerfBridge to the client that exposes window.__HYTOPIA_PERF__
with rich snapshots (FPS, draw calls, triangles, entities, chunks,
GLTF stats, JS heap). BenchmarkRunner now optionally launches a
headless browser via --with-client to collect client metrics alongside
server metrics. Includes client threshold support and rich console output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Changed waitUntil from 'networkidle2' to 'load' since the game
client maintains persistent WebSocket connections that prevent
network idle. Added try-catch around HeadlessClient launch in
BenchmarkRunner so client failures don't crash the whole benchmark.

Verified: idle benchmark with --with-client shows client metrics
(FPS, draw calls, triangles, entities, chunks, JS heap).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix HeadlessClient: ignoreHTTPSErrors, CDP cert bypass, SwiftShader GL
- Patch fetch() to strip unsupported targetAddressSpace (Chrome PNA API)
- Add warmCert step for self-signed HTTPS certs
- Fix BaselineComparer: null-safe operations, nested JSON format support
- Add console log forwarding from headless browser for debugging
- A/B results: blob shadows add +1 draw call, +6.7% frame time, +0.9% triangles

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix SwiftShader: use --enable-unsafe-swiftshader (old flag deprecated)
- Fix HTTPS certs: ignoreHTTPSErrors + CDP Security.setIgnoreCertificateErrors
- Fix fetch: patch targetAddressSpace (Chrome PNA API not available)
- Fix warmCert: pre-accept self-signed HTTPS cert before client navigation
- Add player entity spawning in perf-harness (was spectator-only)
- Add set_camera, throttle_cpu, walk_player (sendMovement via network packets)
- Add wait_for_entities action for entity load synchronization
- Add stress-walkthrough preset: 200 idle entities, deterministic positions
- Expose __HYTOPIA_GAME__ on window for headless camera/input control
- Fix BaselineComparer null-safety for operations + nested format support

A/B results (PR #2 blob shadows, 16 visible entities, SwiftShader):
- Frame time: 83ms → 152ms (+82% FAIL)
- Max draw calls: 25 → 82 (+228%)
- Max triangles: 263k → 584k (+121%)
- Bug found: TransparentSortData missing on shadow meshes (error spam)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ted mobile

Mobile simulation via CDP Emulation.setCPUThrottlingRate (rate=4).
Desktop FPS 12.8 → Mobile FPS 11.0 (baseline), 11.8 (blob shadows).
Frame time: 113ms → 123ms (+8.5% WARN) on mobile with shadows.
Server metrics unaffected (throttle is client-side only).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 8, 2026

Performance Benchmark Results

idle-baseline

Metric Value
Avg Tick 0.08ms
P95 Tick 0.12ms
P99 Tick 0.16ms
Max Tick 0.96ms
Over Budget 0.0%
Avg Memory 38.4MB

stress-test

Metric Value
Avg Tick 0.63ms
P95 Tick 0.86ms
P99 Tick 2.55ms
Max Tick 3.32ms
Over Budget 0.0%
Avg Memory 63.2MB

Generated by @hytopia/perf-tools

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 8, 2026

Performance Benchmark Results

idle-baseline

Metric Value
Avg Tick 0.05ms
P95 Tick 0.08ms
P99 Tick 0.11ms
Max Tick 0.74ms
Over Budget 0.0%
Avg Memory 39.7MB

stress-test

Metric Value
Avg Tick 0.45ms
P95 Tick 0.57ms
P99 Tick 2.10ms
Max Tick 2.31ms
Over Budget 0.0%
Avg Memory 63.2MB

Generated by @hytopia/perf-tools

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 8, 2026

Performance Benchmark Results

idle-baseline

Metric Value
Avg Tick 0.06ms
P95 Tick 0.10ms
P99 Tick 0.15ms
Max Tick 0.20ms
Over Budget 0.0%
Avg Memory 37.7MB

stress-test

Metric Value
Avg Tick 0.52ms
P95 Tick 0.66ms
P99 Tick 2.60ms
Max Tick 4.45ms
Over Budget 0.0%
Avg Memory 63.8MB

Generated by @hytopia/perf-tools

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 8, 2026

Performance Benchmark Results

idle-baseline

Metric Value
Avg Tick 0.04ms
P95 Tick 0.06ms
P99 Tick 0.09ms
Max Tick 0.11ms
Over Budget 0.0%
Avg Memory 40.0MB

stress-test

Metric Value
Avg Tick 0.41ms
P95 Tick 0.59ms
P99 Tick 2.05ms
Max Tick 2.99ms
Over Budget 0.0%
Avg Memory 39.4MB

Generated by @hytopia/perf-tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant